We address the general task of structured commonsense reasoning: given a natural language input, the goal is to generate a graph such as an event -- or a reasoning-graph. To employ large language models (LMs) for this task, existing approaches ``serialize'' the output graph as a flat list of nodes and edges. Although feasible, these serialized graphs strongly deviate from the natural language corpora that LMs were pre-trained on, hindering LMs from generating them correctly. In this paper, we show that when we instead frame structured commonsense reasoning tasks as code generation tasks, pre-trained LMs of code are better structured commonsense reasoners than LMs of natural language, even when the downstream task does not involve source code at all. We demonstrate our approach across three diverse structured commonsense reasoning tasks. In all these natural language tasks, we show that using our approach, a code generation LM (CODEX) outperforms natural-LMs that are fine-tuned on the target task (e.g., T5) and other strong LMs such as GPT-3 in the few-shot setting.
translated by 谷歌翻译
正如最近的作品中观察到的那样,通信图神经网络(GNN)中信号传播的质量强烈影响其表现力。特别是,对于依靠远程相互作用的预测任务,节点特征的递归聚合可能导致不希望的现象称为“过句”。我们提出了一个基于信息收缩的分析过度句子的框架。我们的分析以可靠计算的模型为指导,该模型由于冯·诺伊曼(Von Neumann),该模型在嘈杂的计算图中提供了新的洞察力作为信号淬灭的新见解。在此基础上,我们提出了一个旨在减轻过度量化的算法的图形。我们的算法采用了由扩展器图构造动机的随机局部边缘翻转原始的。我们将算法的光谱膨胀特性与现有基于曲率的非本地重新布线策略的光谱膨胀属性进行了比较。合成实验表明,尽管我们的算法通常具有较慢的膨胀速率,但总体计算更便宜,可以准确地保留节点度,并且永远不会断开图表的连接。
translated by 谷歌翻译
天然语言对代码模型学会生成具有自然语言(NL)意图的代码段。但是,由于每天引入新的库和功能,因此不可能使用培训示例来覆盖所有API的公开库和专有库和功能的快速增长。因此,现有模型本质上不能仅通过将它们纳入培训数据而概括地使用看不见的功能和库。相反,当人类程序员编写程序时,他们经常指文本资源,例如代码手册,文档和教程,以探索和理解可用的库功能。受此观察的启发,我们介绍了Doccoder:一种方法,该方法通过(1)检索给定NL意图的相关文档明确利用代码手册和文档,以及(2)基于NL意图和检索到的文档生成代码。我们的方法是一般的,可以应用于任何编程语言,并且对基础神经模型不可知。我们证明,Doccoder始终改善NL-TO-代码模型:DOCCODER在新的Bash数据集TLDR上的强基准比强基础高11倍;在受欢迎的Python Conala基准中,Doccoder在强大的基线上提高了1.65 BLEU。
translated by 谷歌翻译
基于检索的语言模型(R-LM)通过将标准语言模型(LM)与在测试时从外部数据存储中检索的示例结合使用自然语言文本的概率。虽然有效,但在实践中使用这些模型的主要瓶颈是计算昂贵的数据存储搜索,可以像每个时间步骤一样频繁地执行。在本文中,我们提出了retomaton-检索自动机 - 基于(1)在连续的数据存储条目之间保存指针,以及(2)将条目聚类到“状态”中。这有效地导致了在数据存储顶部构建的加权有限自动机,而不是将数据存储表示为平面列表。自动机的创建是无监督的,可以从任何文本集合中构造一个retomaton:原始训练语料库或另一个域。在推理时与LM推理并行遍历此自动机,将其困惑降低到1.85,或者可节省多达$ k $ nn-lm的最近邻居搜索的83%(Khandelwal等,2020年,没有),没有伤害困惑。我们的代码和训练有素的模型可在https://github.com/neulab/retomaton上找到。
translated by 谷歌翻译
Position modeling plays a critical role in Transformers. In this paper, we focus on length extrapolation, i.e., training on short texts while evaluating longer sequences. We define attention resolution as an indicator of extrapolation. Then we propose two designs to improve the above metric of Transformers. Specifically, we introduce a relative position embedding to explicitly maximize attention resolution. Moreover, we use blockwise causal attention during inference for better resolution. We evaluate different Transformer variants with language modeling. Experimental results show that our model achieves strong performance in both interpolation and extrapolation settings. The code will be available at https://aka.ms/LeX-Transformer.
translated by 谷歌翻译
Despite their widespread adoption, neural conversation models have yet to exhibit natural chat capabilities with humans. In this research, we examine user utterances as causes and generated responses as effects, recognizing that changes in a cause should produce a different effect. To further explore this concept, we have compiled and expanded upon a new dataset called CausalDialogue through crowd-sourcing. This dataset includes multiple cause-effect pairs within a directed acyclic graph (DAG) structure. Our analysis reveals that traditional loss functions can struggle to effectively incorporate the DAG structure, leading us to propose a causality-enhanced method called Exponential Maximum Average Treatment Effect (ExMATE) to enhance the impact of causality at the utterance level in training neural conversation models. To evaluate the effectiveness of this approach, we have built a comprehensive benchmark using the CausalDialogue dataset leveraging large-scale pre-trained language models, and have assessed the results through both human and automatic evaluation metrics for coherence, diversity, and agility. Our findings show that current techniques are still unable to effectively address conversational DAGs, and that the ExMATE method can improve the diversity and agility of conventional loss functions while maintaining coherence.
translated by 谷歌翻译
Multilingual machine translation models can benefit from synergy between different language pairs, but also suffer from interference. While there is a growing number of sophisticated methods that aim to eliminate interference, our understanding of interference as a phenomenon is still limited. This work identifies the main factors that contribute to interference in multilingual machine translation. Through systematic experimentation, we find that interference (or synergy) are primarily determined by model size, data size, and the proportion of each language pair within the total dataset. We observe that substantial interference occurs mainly when the model is very small with respect to the available training data, and that using standard transformer configurations with less than one billion parameters largely alleviates interference and promotes synergy. Moreover, we show that tuning the sampling temperature to control the proportion of each language pair in the data is key to balancing the amount of interference between low and high resource language pairs effectively, and can lead to superior performance overall.
translated by 谷歌翻译
Business processes that involve AI-powered automation have been gaining importance and market share in recent years. These business processes combine the characteristics of classical business process management, goal-driven chatbots, conversational recommendation systems, and robotic process automation. In the new context, prescriptive process monitoring demands innovative approaches. Unfortunately, data logs from these new processes are still not available in the public domain. We describe the main challenges in this new domain and introduce a synthesized dataset that is based on an actual use case of intelligent process automation with chatbot orchestration. Using this dataset, we demonstrate crowd-wisdom and goal-driven approaches to prescriptive process monitoring.
translated by 谷歌翻译
We construct a universally Bayes consistent learning rule that satisfies differential privacy (DP). We first handle the setting of binary classification and then extend our rule to the more general setting of density estimation (with respect to the total variation metric). The existence of a universally consistent DP learner reveals a stark difference with the distribution-free PAC model. Indeed, in the latter DP learning is extremely limited: even one-dimensional linear classifiers are not privately learnable in this stringent model. Our result thus demonstrates that by allowing the learning rate to depend on the target distribution, one can circumvent the above-mentioned impossibility result and in fact, learn \emph{arbitrary} distributions by a single DP algorithm. As an application, we prove that any VC class can be privately learned in a semi-supervised setting with a near-optimal \emph{labeled} sample complexity of $\tilde{O}(d/\varepsilon)$ labeled examples (and with an unlabeled sample complexity that can depend on the target distribution).
translated by 谷歌翻译
Out-of-distribution (OOD) detection has attracted a large amount of attention from the machine learning research community in recent years due to its importance in deployed systems. Most of the previous studies focused on the detection of OOD samples in the multi-class classification task. However, OOD detection in the multi-label classification task remains an underexplored domain. In this research, we propose YolOOD - a method that utilizes concepts from the object detection domain to perform OOD detection in the multi-label classification task. Object detection models have an inherent ability to distinguish between objects of interest (in-distribution) and irrelevant objects (e.g., OOD objects) on images that contain multiple objects from different categories. These abilities allow us to convert a regular object detection model into an image classifier with inherent OOD detection capabilities with just minor changes. We compare our approach to state-of-the-art OOD detection methods and demonstrate YolOOD's ability to outperform these methods on a comprehensive suite of in-distribution and OOD benchmark datasets.
translated by 谷歌翻译